Some recent results have introduced external-memory algorithms to computeself-indexes of a set of strings, mainly via computing the Burrows-WheelerTransform (BWT) of the input strings. The motivations for those results stemfrom Bioinformatics, where a large number of short strings (called reads) areroutinely produced and analyzed. In that field, a fundamental problem is toassemble a genome from a large set of much shorter samples extracted from theunknown genome. The approaches that are currently used to tackle this problemare memory-intensive. This fact does not bode well with the ongoing increase inthe availability of genomic data. A data structure that is used in genomeassembly is the string graph, where vertices correspond to samples and arcsrepresent two overlapping samples. In this paper we address an open problem: todesign an external-memory algorithm to compute the string graph.
展开▼